44 research outputs found

    PhysBinder : improving the prediction of transcription factor binding sites by flexible inclusion of biophysical properties

    Get PDF
    The most important mechanism in the regulation of transcription is the binding of a transcription factor (TF) to a DNA sequence called the TF binding site (TFBS). Most binding sites are short and degenerate, which makes predictions based on their primary sequence alone somewhat unreliable. We present a new web tool that implements a flexible and extensible algorithm for predicting TFBS. The algorithm makes use of both direct (the sequence) and several indirect readout features of protein-DNA complexes (biophysical properties such as bendability or the solvent-excluded surface of the DNA). This algorithm significantly outperforms state-of-the-art approaches for in silico identification of TFBS. Users can submit FASTA sequences for analysis in the PhysBinder integrative algorithm and choose from >60 different TF-binding models. The results of this analysis can be used to plan and steer wet-lab experiments. The PhysBinder web tool is freely available at http://bioit.dmbr.ugent.be/physbinder/index.php

    A distance difference matrix approach to identifying transcription factors that regulate differential gene expression

    Get PDF
    A distance difference matrix method is presented for identifying transcription factor binding sites of secondary factors responsible for the different responses of the target genes of one transcription factor

    ConTra v2: a tool to identify transcription factor binding sites across species, update 2011

    Get PDF
    Transcription factors are important gene regulators with distinctive roles in development, cell signaling and cell cycling, and they have been associated with many diseases. The ConTra v2 web server allows easy visualization and exploration of predicted transcription factor binding sites in any genomic region surrounding coding or non-coding genes. In this new version, users can choose from nine reference organisms ranging from human to yeast. ConTra v2 can analyze promoter regions, 5′-UTRs, 3′-UTRs and introns or any other genomic region of interest. Hundreds of position weight matrices are available to choose from, but the user can also upload any other matrices for detecting specific binding sites. A typical analysis is run in four simple steps of choosing the gene, the transcript, the region of interest and then selecting one or more transcription factor binding sites. The ConTra v2 web server is freely available at http://bioit.dmbr.ugent.be/contrav2/index.php

    ConTra: a promoter alignment analysis tool for identification of transcription factor binding sites across species

    Get PDF
    Transcription factors (TFs) are key components in signaling pathways, and the presence of their binding sites in the promoter regions of DNA is essential for their regulation of the expression of the corresponding genes. Orthologous promoter sequences are commonly used to increase the specificity with which potentially functional transcription factor binding sites (TFBSs) are recognized and to detect possibly important similarities or differences between the different species. The ConTra (conserved TFBSs) web server provides the biologist at the bench with a user-friendly tool to interactively visualize TFBSs predicted using either TransFac (1) or JASPAR (2) position weight matrix libraries, on a promoter alignment of choice. The visualization can be preceded by a simple scoring analysis to explore which TFs are the most likely to bind to the promoter of interest. The ConTra web server is available at http://bioit.dmbr.ugent.be/ConTra/index.php

    A flexible integrative approach based on random forest improves prediction of transcription factor binding sites

    Get PDF
    Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding

    In silico approaches to studying transcriptional gene regulation: prediction of transcription factor binding sites and applications thereof

    No full text
    Transcription factor binding sites (TFBSs) are DNA sequences of 6 to 15 base pairs and interaction with their binding partners, the transcription factors (TFs), largely determines the observed spatiotemporal gene expression patterns. Accurate in silico identification of TFBSs could thus provide valuable support for research on transcriptional gene regulation, but this proved to be a difficult task, partly due to a lack of centralized useful data. Tools that use noisy predictions of TFBSs, however, can already aid in unraveling gene regulatory networks. • Many DNA sites are experimentally proven to be bound by a TF, but they are scattered throughout scientific literature. I joined a community-based effort to tackle the shortage of TFBS data. Collecting them and storing TFBSs in a central place was necessary to make any progress in modeling DNA binding specificity of TFs and to study transcriptional gene regulatory mechanisms. Before, during and after the three-day RegCreative jamboree, which was organized in our department (November 29th till December 1st 2006), new records were added to the new database ORegAnno. Furthermore, ontologies were discussed, as well as text-mining strategies for automation of data curation. In those discussions, the approach of ORegAnno was taken as a reference point. The database was updated to contain more data, and was featured with a publication queue that consists of papers with high potential for successful curation of one or more regulatory regions. • I helped to introduce a method that considers two sets of genes that are differentially expressed under the same environmental conditions (tissue or cell type, addition of a TF or other impulse). Such sets of genes can typically be derived from microarray experiments. The method is based on the distance difference matrix concept and simultaneously integrates statistical overrepresentation and co-occurrence of predicted TFBSs in the promoters of the genes, in order to find the secondary TFs responsible for the differential expression. A web interface to our DDM-MDS method is to be found at http://bioit.dmbr.ugent.be/TFdiff/. • Orthologous promoter sequences are commonly used to increase the specificity with which potentially functional TFBSs are recognized and to detect possibly important similarities or differences between different species. We developed ConTra (conserved TFBSs), a user-friendly web tool that allows the biologist at the bench to interactively visualize TFBSs predicted using position weight matrix (PWM) libraries, on a promoter alignment of choice. The visualization can be preceded by a simple scoring analysis to explore which TFs are the most likely to bind to the promoter of interest. The ConTra web server is available at http://bioit.dmbr.ugent.be/ConTra/. • We determined the value of using DNA structural information in sequence-based prediction of TFBSs. Based on the random forest (RF) algorithm, we created a method that utilizes DNA-sequence-dependent structural information in a flexible way. We qualitatively compared the classification accuracy of this so-called biophysical method with the accuracy of methods that use nucleotide identity information only, namely, the widely used PWM method and a so-called NPD method, which models nucleotide dependencies between positions with the same RF algorithm. Our results for five TFs with different DNA-binding domains show that the biophysical method alone performs surprisingly well. It complements the NPD method and the PWM method to some extent, and combining all three methods yields a classification accuracy that is higher than that of either method
    corecore